home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Loadstar 242
/
242.d81
/
t.pack about
< prev
next >
Wrap
Text File
|
2022-08-26
|
13KB
|
478 lines
u
S T A R P A C K E R 1 . 2
by Lee Novak
[ALL SECRETS REVEALED]
[{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
This article is for anyone who is
interested in how this packer works.
While the actual code was very
difficult to write, the ideas behind
it are rather simple.
You might think: "How can a file
be compressed? How can you construct
a file without having all the data?"
It doesn't seem possible.
The idea I got from LS #135's
article on LZS COMPRESSION was that
any duplicated data can be packed.
What's needed is a way to signal when
something is to repeated. You also
need to tell the unpacker where the
original data is, and how much of it
you need to repeat.
How you go about this doesn't
matter, just as long as it works! The
LZS article talks of an unpacker that
reads from a stream of bits only.
This means that specific information
(where repeat data is, etc.) might be
split up between two or more bytes.
I'm guessing that this method of
varying bit-lengths provides the best
compression, and maybe that's why the
BIT IMPLODER is so good. But it sure
sounded difficult. I didn't consider
using this method for long.
[THE THEORY OF RARES]
[{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
I knew this packer was going to
use simpler byte-sized pieces of
data. Every byte the unpacker would
read in could have any value 0-255.
At least one value must be reserved
to tell the unpacker that compression
information is to follow.
What value to use? Zero? No way!
There are lots of zeroes in almost
every file! What about a number that
is hardly ever used? Better. But some
files would use this number more than
others. What was needed was a custom
value for each file.
How about counting the number of
occurrences for each byte? Aha! The
least common byte, the RARE byte, is
to signal when compression is used.
The unpacker would read in the
byte following the RARE and know it
has special meaning. But there are so
many possible ways that information
might be packed. How would you tell
the unpacker to duplicate five bytes
(matches) located 117 bytes back?
If two bytes followed the RARE,
you COULD represent this information.
But then, how would you tell the
unpacker to copy 37 bytes located
9155 bytes back? We surely don't want
to pass up the savings there! Hmm,
maybe we need three bytes to follow
the RARE. One byte would represent
the number of matches, and the next
two tell how far back to get them.
So now, we would need FOUR bytes
to signify even the shortest packing:
one RARE then three bytes. So, what
advantage is there in "compressing" a
string of four matches? None at all.
But there's bound to be lots and lots
of four-byte matches in a file! Can't
we use ANY of them?
Hey, what if we use another RARE?
The unpacker could check if a byte is
either of these RARES, and follow a
different unpacking formula for each.
Why not three RARES? Okay! Why not
four, or five, or even twenty RARES?
Two problems with this. One: the
unpacker code grows with each RARE
you decide to add, because each RARE
represents a compression all its own.
And two: all normal, uncompressed
bytes are just written to memory as-
is. But sometimes, an uncompressed
RARE is part of the file. On these
occasions, TWO bytes are needed to
represent that one byte. A RARE to
say "Hey, here comes a rare!" and one
byte to say "We want this one!".
The second RARE is slightly less
"rare" than the first. The third is
even less so. Eventually, the bytes
may become so common that it costs
MORE to represent them as two bytes
each than the potential savings using
another type of compression may add.
[WHAT EACH RARE DOES]
[{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
I did a lot of experimentation
using different numbers of RARES and
ways of packing the information. The
most favorable all-purpose method of
packing turned out to be one that
uses the top eight RARES.
Pretend you're an unpacker. You
read along, splatting uncompressed
bytes down without thought. Then you
come across one of these RARE bytes.
It's time to take the next byte, or
bytes, more seriously. Here's EXACTLY
what to do in these situations!
RARE1, BYTE1:
% 0000 0000
M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
3 matches, 3-258 bytes away
Every possible value for BYTE1 is
useful. ONE byte is saved for each
3-byte match. There were so many of
these that a second RARE was added:
RARE2, BYTE1:
% 0000 0000
M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
3 matches, 259-514 bytes away
Gathering 3-byte matches wasn't
profitable after that, so all others
are ignored.
RARE3, BYTE1:
% 0000 0000
M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
4 matches, 4-259 bytes away
Each one of these saves TWO
bytes. There are still larger matches
close to the target string, but they
are becoming less common. Time to
break down the data byte into bits:
RARE4, BYTE1:
P{SHIFT-*} 0=5 matches
% 0000 0000 1=6 matches
M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
5-132 bytes away
or 6-133 bytes away
RARE5, BYTE1:
P{SHIFT-*} 0=7 matches
% 0000 0000 1=8 matches
M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
7-134 bytes away
or 8-135 bytes away
These 2-byte compressions save
between 3 and 6 bytes apiece. Things
are thinning out even more now, but
we still want anything that will make
a profit. Let's add a data byte:
RARE6, BYTE1, BYTE2:
% 0000 0000 % 0000 0000
MR{SHIFT-*}{SHIFT--} MR{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
{SHIFT--} high low
{SHIFT--} nybble byte
{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
{SHIFT--} 0-4095 bytes away
{SHIFT--}
4-19 matches
RARE7, BYTE1, BYTE2:
% 0000 0000 % 0000 0000
MR{SHIFT-*}{SHIFT--} MR{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
{SHIFT--} high low
{SHIFT--} nybble byte
{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
{SHIFT--} 4096-8191 bytes away
{SHIFT--}
4-19 matches
These 3-byte compressions save
between 1 and 16 bytes apiece. We
still need one more RARE to handle
the rest of the possibilities:
RARE8, BYTE1 (0):
End of compression. RUN program.
RARE8, BYTE1 (1-8):
Add a specific uncompressed RARE.
RARE8, BYTE1 (9), BYTE2:
This unusual type of compression
is for font data ONLY. If the exact
REVERSE of the target string can be
found exactly 1K earlier in the file,
just like reversed font data, then we
go back to copy 4-255 REVERSE bytes
as specified in BYTE2.
This EOR compression is probably
only useful to STAR LINKED files. The
cost of the related unpacking code is
34 bytes. This cost is recovered with
the first 5 characters that can be
packed this way!
RARE8, BYTE1 (10-127), BYTE2, BYTE3:
%00000000 %00000000 %00000000
{SHIFT--}M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
{SHIFT--} 5-122 high byte low byte
{SHIFT--} matches M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
{SHIFT--} 0-65535 bytes away
Malways 0
All remaining profitable matches
are compressed this way. Since using
this method costs 4 bytes each time,
at least 5 characters must be packed
with it to save at least one byte.
RARE8, BYTE1 (128-255), BYTE2:
%00000000 %00000000
{SHIFT--}M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--}
{SHIFT--} 4-131 byte to repeat
{SHIFT--} repeats
{SHIFT--}
Malways 1
This last one is known as RLE, or
Run-Length-Encoding. It can be pretty
useful for packing fonts, sprites, or
anything which may have 4 or more
identical bytes in a row.
Now you know what does what, you
get a better idea of the information
on the packing screen. Here are the
RARES one last time, with the
categories they fall into:
RARE1 - RARE5 = "short"
RARE6 - RARE7 = "medium"
RARE8 with 10-127 = "long"
RARE8 with 128-255 = "repeat"
RARE8 with 9 = "rvs font"
The number of uncompressed RARES
in the file isn't shown anywhere.
[TOIL AND TROUBLE]
[{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}]
Why is this packer so slow?
Actually it's not. You won't BELIEVE
how much work is going on internally!
The packer begins at the end of a
file and works backwards. It searches
for the MOST MATCHES as CLOSE TO the
target string as possible.
The packer begins this search for
each 3-byte string a certain distance
back from the string. How far back it
goes depends on the CRUNCH MODE:
Crunch Mode How Far Back
{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}
1 1 page (256 bytes)
2 2 pages (512 bytes)
3 4 pages (1K)